- Title
- Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification
- Creator
- Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo
- Relation
- ARC.DP120102576 | ARC|DP140104183 | ARC|FT120100060 | NHMRC|512423 http://purl.org/au-research/grants/arc/FT120100060
- Relation
- PLoS ONE Vol. 11, Issue 1
- Publisher Link
- http://dx.doi.org/10.1371/journal.pone.0146116
- Publisher
- Public Library of Science (PLoS)
- Resource Type
- journal article
- Date
- 2016
- Description
- Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (a, ß) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
- Subject
- (α, β)-k feature set method; data classification; heterogeneous ensembles; GA-EoC
- Identifier
- http://hdl.handle.net/1959.13/1321452
- Identifier
- uon:24365
- Identifier
- ISSN:1932-6203
- Rights
- © 2016 Haque et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- Language
- eng
- Full Text
- Hits: 2172
- Visitors: 2807
- Downloads: 330
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT02 | Publisher version (open access) | 2 MB | Adobe Acrobat PDF | View Details Download |